235 research outputs found
Efficient AUC Optimization for Information Ranking Applications
Adequate evaluation of an information retrieval system to estimate future
performance is a crucial task. Area under the ROC curve (AUC) is widely used to
evaluate the generalization of a retrieval system. However, the objective
function optimized in many retrieval systems is the error rate and not the AUC
value. This paper provides an efficient and effective non-linear approach to
optimize AUC using additive regression trees, with a special emphasis on the
use of multi-class AUC (MAUC) because multiple relevance levels are widely used
in many ranking applications. Compared to a conventional linear approach, the
performance of the non-linear approach is comparable on binary-relevance
benchmark datasets and is better on multi-relevance benchmark datasets.Comment: 12 page
On Making Good Games - Using Player Virtue Ethics and Gameplay Design Patterns to Identify Generally Desirable Gameplay Features
This paper uses a framework of player virtues to perform a
theoretical exploration of what is required to make a game
good. The choice of player virtues is based upon the view
that games can be seen as implements, and that these are
good if they support an intended use, and the intended use
of games is to support people to be good players. A collection of gameplay design patterns, identified through
their relation to the virtues, is presented to provide specific starting points for considering design options for this type of good games. 24 patterns are identified supporting the virtues, including RISK/REWARD, DYNAMIC ALLIANCES, GAME MASTERS, and PLAYER DECIDED RESULTS, as are 7 countering three or more virtues, including ANALYSIS
PARALYSIS, EARLY ELIMINATION, and GRINDING. The paper concludes by identifying limitations of the approach as well as by showing how it can be applied using other views of what are preferable features in games
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
An automatic critical care urine meter
Nowadays patients admitted to critical care units have most of their physiological parameters measured automatically by sophisticated commercial monitoring devices. More often than not, these devices supervise whether the values of the parameters they measure lie within a pre-established range, and issue warning of deviations from this range by triggering alarms. The automation of measuring and supervising tasks not only discharges the healthcare staff of a considerable workload but also avoids human errors in these repetitive and monotonous tasks. Arguably, the most relevant physiological parameter that is still measured and supervised manually by critical care unit staff is urine output (UO). In this paper we present a patent-pending device that provides continuous and accurate measurements of patient’s UO. The device uses capacitive sensors to take continuous measurements of the height of the column of liquid accumulated in two chambers that make up a plastic container. The first chamber, where the urine inputs, has a small volume. Once it has been filled it overflows into a second bigger chamber. The first chamber provides accurate UO measures of patients whose UO has to be closely supervised, while the second one avoids the need for frequent interventions by the nursing staff to empty the containe
Like trainer, like bot? Inheritance of bias in algorithmic content moderation
The internet has become a central medium through which `networked publics'
express their opinions and engage in debate. Offensive comments and personal
attacks can inhibit participation in these spaces. Automated content moderation
aims to overcome this problem using machine learning classifiers trained on
large corpora of texts manually annotated for offence. While such systems could
help encourage more civil debate, they must navigate inherently normatively
contestable boundaries, and are subject to the idiosyncratic norms of the human
raters who provide the training data. An important objective for platforms
implementing such measures might be to ensure that they are not unduly biased
towards or against particular norms of offence. This paper provides some
exploratory methods by which the normative biases of algorithmic content
moderation systems can be measured, by way of a case study using an existing
dataset of comments labelled for offence. We train classifiers on comments
labelled by different demographic subsets (men and women) to understand how
differences in conceptions of offence between these groups might affect the
performance of the resulting models on various test sets. We conclude by
discussing some of the ethical choices facing the implementers of algorithmic
moderation systems, given various desired levels of diversity of viewpoints
amongst discussion participants.Comment: 12 pages, 3 figures, 9th International Conference on Social
Informatics (SocInfo 2017), Oxford, UK, 13--15 September 2017 (forthcoming in
Springer Lecture Notes in Computer Science
Space-Time Structure of Loop Quantum Black Hole
In this paper we have improved the semiclassical analysis of loop quantum
black hole (LQBH) in the conservative approach of constant polymeric parameter.
In particular we have focused our attention on the space-time structure. We
have introduced a very simple modification of the spherically symmetric
Hamiltonian constraint in its holonomic version. The new quantum constraint
reduces to the classical constraint when the polymeric parameter goes to
zero.Using this modification we have obtained a large class of semiclassical
solutions parametrized by a generic function of the polymeric parameter. We
have found that only a particular choice of this function reproduces the black
hole solution with the correct asymptotic flat limit. In r=0 the semiclassical
metric is regular and the Kretschmann invariant has a maximum peaked in
L-Planck. The radial position of the pick does not depend on the black hole
mass and the polymeric parameter. The semiclassical solution is very similar to
the Reissner-Nordstrom metric. We have constructed the Carter-Penrose diagrams
explicitly, giving a causal description of the space-time and its maximal
extension. The LQBH metric interpolates between two asymptotically flat
regions, the r to infinity region and the r to 0 region. We have studied the
thermodynamics of the semiclassical solution. The temperature, entropy and the
evaporation process are regular and could be defined independently from the
polymeric parameter. We have studied the particular metric when the polymeric
parameter goes towards to zero. This metric is regular in r=0 and has only one
event horizon in r = 2m. The Kretschmann invariant maximum depends only on
L-Planck. The polymeric parameter does not play any role in the black hole
singularity resolution. The thermodynamics is the same.Comment: 17 pages, 19 figure
CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams
Mining association rules from data streams is a challenging task due to the
(typically) limited resources available vs. the large size of the result.
Frequent closed itemsets (FCI) enable an efficient first step, yet current FCI
stream miners are not optimal on resource consumption, e.g. they store a large
number of extra itemsets at an additional cost. In a search for a better
storage-efficiency trade-off, we designed Ciclad,an intersection-based
sliding-window FCI miner. Leveraging in-depth insights into FCI evolution, it
combines minimal storage with quick access. Experimental results indicate
Ciclad's memory imprint is much lower and its performances globally better than
competitor methods.Comment: KDD2
New perspectives on the ecology of tree structure and tree communities through terrestrial laser scanning
Terrestrial laser scanning (TLS) opens up the possibility of describing the three-dimensional structures of trees in natural environments with unprecedented detail and accuracy. It is already being extensively applied to describe how ecosystem biomass and structure vary between sites, but can also facilitate major advances in developing and testing mechanistic theories of tree form and forest structure, thereby enabling us to understand why trees and forests have the biomass and three-dimensional structure they do. Here we focus on the ecological challenges and benefits of understanding tree form, and highlight some advances related to capturing and describing tree shape that are becoming possible with the advent of TLS. We present examples of ongoing work that applies, or could potentially apply, new TLS measurements to better understand the constraints on optimization of tree form. Theories of resource distribution networks, such as metabolic scaling theory, can be tested and further refined. TLS can also provide new approaches to the scaling of woody surface area and crown area, and thereby better quantify the metabolism of trees. Finally, we demonstrate how we can develop a more mechanistic understanding of the effects of avoidance of wind risk on tree form and maximum size. Over the next few years, TLS promises to deliver both major empirical and conceptual advances in the quantitative understanding of trees and tree-dominated ecosystems, leading to advances in understanding the ecology of why trees and ecosystems look and grow the way they do
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
- …